228 research outputs found
Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition
In this paper, we present several adaptation methods for non-native speech
recognition. We have tested pronunciation modelling, MLLR and MAP non-native
pronunciation adaptation and HMM models retraining on the HIWIRE foreign
accented English speech database. The ``phonetic confusion'' scheme we have
developed consists in associating to each spoken phone several sequences of
confused phones. In our experiments, we have used different combinations of
acoustic models representing the canonical and the foreign pronunciations:
spoken and native models, models adapted to the non-native accent with MAP and
MLLR. The joint use of pronunciation modelling and acoustic adaptation led to
further improvements in recognition accuracy. The best combination of the above
mentioned techniques resulted in a relative word error reduction ranging from
46% to 71%
DNN-Based Semantic Model for Rescoring N-best Speech Recognition List
The word error rate (WER) of an automatic speech recognition (ASR) system
increases when a mismatch occurs between the training and the testing
conditions due to the noise, etc. In this case, the acoustic information can be
less reliable. This work aims to improve ASR by modeling long-term semantic
relations to compensate for distorted acoustic features. We propose to perform
this through rescoring of the ASR N-best hypotheses list. To achieve this, we
train a deep neural network (DNN). Our DNN rescoring model is aimed at
selecting hypotheses that have better semantic consistency and therefore lower
WER. We investigate two types of representations as part of input features to
our DNN model: static word embeddings (from word2vec) and dynamic contextual
embeddings (from BERT). Acoustic and linguistic features are also included. We
perform experiments on the publicly available dataset TED-LIUM mixed with real
noise. The proposed rescoring approaches give significant improvement of the
WER over the ASR system without rescoring models in two noisy conditions and
with n-gram and RNNLM
Fast Channel and Noise Compensation in the Spectral Domain
Colloque avec actes et comité de lecture. internationale.International audienceWe compare in this work several methods for fast adaptation of speech models to convolutional and additive noise. The tested algorithms are Parallel Model Combination (PMC), Cepstral Mean Subtraction (CMS), and an algorithm that combines PMC and CMS in the spectral domain. Experiments are realized on a natural numbers recognition task in French. We have trained the acoustic models on the SPEECHDAT database (recorded through telephone lines), and we have tested the system on the VODIS database (recorded in three different cars)
Detection of Phone Boundaries for Non-Native Speech using French-German Models
International audienceWithin the framework of computer assisted foreign language learning for the French/German pair, we evaluate different HMM phone models for detecting accurate phone boundaries. The optimal parameters are determined by minimizing on the non-native speech corpus the number of phones whose boundaries are shifted by more than 20 ms compared to the manual boundaries. We observe that the best performance was obtained by combining a French native HMM model with an automatically selected German native HMM model
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
State-of-the-art approaches for hate-speech detection usually exhibit poor
performance in out-of-domain settings. This occurs, typically, due to
classifiers overemphasizing source-specific information that negatively impacts
its domain invariance. Prior work has attempted to penalize terms related to
hate-speech from manually curated lists using feature attribution methods,
which quantify the importance assigned to input terms by the classifier when
making a prediction. We, instead, propose a domain adaptation approach that
automatically extracts and penalizes source-specific terms using a domain
classifier, which learns to differentiate between domains, and
feature-attribution scores for hate-speech classes, yielding consistent
improvements in cross-domain evaluation.Comment: COLING 2022 pre-prin
Semi-automatic phonetic labelling of large corpora
International audienceThe aim of the present paper is to present a methodology to semi-automatically label large corpora. This methodology is based on three main points: using several concurrent automatic stochastic labellers, decomposing the labelling of the whole corpus into an iterative refining process and building a labelling comparison procedure which takes into account phonologic and acoustic-phonetic rules to evaluate the similarity of the various labelling of one sentence. After having detailed these three points, we describe our HMM-based labelling tool and we describe the application of that methodology to the Swiss French POLYPHON database
RNN Language Model Estimation for Out-of-Vocabulary Words
International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM
Out-of-Vocabulary Word Probability Estimation using RNN Language Model
International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show perplexity reductions of about 14% relative compared to baseline RNNLM
- …